Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal client accounting #1178

Merged
merged 6 commits into from
Feb 28, 2019
Merged

Temporal client accounting #1178

merged 6 commits into from
Feb 28, 2019

Conversation

avbelov23
Copy link
Contributor

Requires #1115

#1145

@avbelov23 avbelov23 changed the title Storing client data in tdb Temporal client accounting Feb 8, 2019
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good iteration, but more work is still required.

#
# Default:
# client_lifetime 0;
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a new chapter "Client resource accounting" in https://github.com/tempesta-tech/tempesta/wiki/Handling-clients with the description of the configuration options and a general words why do we need the configuration (i.e. that Tempesta accounts client resource usage and keeps the data in a database for configured period of time after a client last connection shutdown).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually very subtle topic and I want to discuss this. @avbelov23 @i-rinat @aleksostapenko @ikoveshnikov your opinions are very appreciated.

I actually don't like the configuration parameter. Not only because of configuration complexity, but also because of dependency on other timeouts and possible configuration inconsistencies leading to security issues (also see comment from @ikoveshnikov #1145 (comment) ).

Suppose we configure a timeout on body chunks as 10 sec and if we configure this parameter as 5 seconds, then we just have no limitation on body chunks timeout. This leads us that the lifetime must be at least the greatest Frang timeout.

We have session_lifetime (https://github.com/tempesta-tech/tempesta/wiki/Sticky-Cookie#session-lifetime) - a very clear entity. Probably it makes sense to generalize session_lifetime to client records lifetime and our sessions? Also it seems we should require that sess_lifetime should be not less than the maximum frang timeout.

There were several discussions what HTTP session is and after all it's not only TCP connection or a cookie (because some clients could not have cookies), so I think client_lifetime also could be another definition for what HTTP session is.

Copy link
Contributor

@vankoven vankoven Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any dependencies between this parameter and other *lifetime parameters. The directive say Life time of client accounting data after last client connection was closed. It's closer to garbage collecting timeout: how many time to wait before unused TfwClient entry will be destroyed.

But dependencies between session_lifetime and frang limits truly exist.

Probably it makes sense to generalize session_lifetime to client records lifetime and our sessions?

Sounds good. There was a discussion one day, that we would like to block certain clients, not proxies that they use. Probably frang accounting data should be also stored in TfwHttpSess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense in context of discussion https://github.com/tempesta-tech/tempesta/pull/1178/files#r255660386 . Previously we obtained TfwClient solely by IP address, i.e. we essentially limited TCP connections. The granularity didn't satisfy us and we came to obtaining clients based on HTTP headers, so now we have sticky cookie, User-Agent, X-Forwaded-For whatever and we should limit clients identified by their HTTP data.

And we can leave all TCP connection limiting to nftables layer while Tempesta cares solely on HTTP limiting - clear and simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing about sticky cookies in #1115

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 things are still for TODO:

  1. client_cfg.lifetime must be set from frang_limits (porbably by a callback) as the maximum timeout from the limits + HZ. Correspondingly there shouldn't be client_lifetime configuration option.

  2. Please write a new chapter "Client resource accounting" in https://github.com/tempesta-tech/tempesta/wiki/Handling-clients saying that all client accounting information is stored in a persistent Tempesta DB table with limited lifetime and describe the client_db configuration option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just add the lifetime descrption to the same Wiki page

Copy link
Contributor Author

@avbelov23 avbelov23 Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided that client_cfg.lifetime still depends from the time frame http_resp_code_block

if (!(cli = kmem_cache_alloc(cli_cache, GFP_ATOMIC | __GFP_ZERO))) {
spin_unlock(&hb->lock);
len = sizeof(*cli);
rec = tdb_rec_get_alloc(client_db, key, &len, &addr_eq, &is_new, &addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now cli isn't zeroed and conn_users may have garbage content. peer is initialize as well as class_prvt if init() != NULL, so it seems conn_users is only in problem. Since you call atomic_inc_return(), it makes sense to move the call to the else branch and set 1 for a new entries to avoid double atomics.

TFW_DBG_ADDR("client address", &cli->addr, TFW_NO_PORT);
} else {
if (curr_time > ent->expires) {
memset(&cli->class_prvt, 0, sizeof(cli->class_prvt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use bzero_fast() - the code is always called in softirq context

if (init)
init(cli);
}
tdb_rec_put(rec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not call tdb_rec_put() here - you return cli residing in the record to a caller, so a crash can happen. tdb_rec_put() is called when all work with the entity is done. See also comment for tdb_rec_get_alloc().

}

*is_new = true;
r = tdb_entry_alloc(db, key, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function interface is inconsistent now: we may exit with got and not records. tdb_rec_put() just releases read spin lock for a bucket handling the record, so the bucket can not be split and the record won't be replaced. I reckon the function must exit with acquired bucket read lock.

Moreover, I explored __cache_add_node() and it does exactly the same: a memory area for the record is inserted into HTrie, so it becomes accessible for parallel lookups, but we go to write the record content (even with prossible record extensions!) without any locks. This may lead to retrieving partially written cache records as well as to crashes if the record is in extension progress.

Please review this behavior and if I'm right, then please add a TODO comment to the function and a requirement to fix this to #515.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that tdb_rec_get() returns with a lock to the bucket held. But couldn't find in the code if tdb_entry_alloc() holds a lock upon return. I believe it is not. So if we return from the function if (*predicate)() produces true, lock is held. But if we create a new record, it is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems addressed in wrong way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must check that *len is equal (or larger than) *len before. Function tdb_entry_alloc() can allocate less than *len, and then returns actual number of allocated storage in *len.

break;
nchunks++;
}
s_ip.nchunks = nchunks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw_http_req_process() is too long and overloaded now, so it's better to move the str logic to a small helping function. I think the whole new code with the new local variables can and should be moved to the helper.


if (TFW_STR_EMPTY(&s_ip) || tfw_addr_pton(&s_ip, &addr) != 0) {
ss_getpeername(conn->sk, &addr);
}
Copy link
Contributor

@krizhanovsky krizhanovsky Feb 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no sense to execute the code if we have no User-Agent since the client was already obtained in tfw_sock_clnt_new() for the same address. UPD this part of the comment is wrong - we do need to obtain client for X-Forwarded-For, then code is corect in this part.

BTW we do not use braces for single-line IF statements unless there is multi-line ELSE.

if (cli && conn_cli != cli) {
tfw_connection_unlink_from_peer(conn);
tfw_client_put(conn_cli);
tfw_connection_link_peer(conn, (TfwPeer *)cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very subtle place. Here we're under the socket lock acquired by the TCP caller and asserted in ss_tcp_data_ready(), so we have no concurrent ingress requests. However, what if some other parallel softirq is processing the client and connection, e.g.on processing server response? Do we have such places which work with TfwClient on server responses? A comment about the operations safety is required. We had plenty of bugs on races with the connections linkage.

Copy link
Contributor

@vankoven vankoven Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I against the change. We mustn't update the TfwCliConn->peer on processing a random request. I suppose that a new member TfwHttpReq->peer is required:

  • The peer in TfwCliConn->peer - is hop-by-hop peer, which can be an intermediate proxy.
  • The peer in TfwHttpReq->peer - is end-to-end peer, which is (mostly but not always) browser.

Take a look at http_resp_code_block Frang directive. The directive adds a server connection hook, and it's targeted to block clients that e.g. tries to bruteforce passwords. It was discussed in the chat some day, that we don't want to block the whole intermediate proxy, only some clients behind it. So we'll need to some how find an end-to-end peer, having only TfwHttpReq in sight.

With the changes proposed, TfwHttpReq->conn->peer will return a random end-to-end client and ban it on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ikoveshnikov: in case of pipelined requests the client instance in frang_resp_handler() for particular resp->req - can be set by another request (and belong to another real client), so accounting will be broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, what if some other parallel softirq is processing the client and connection, e.g.on processing server response? Do we have such places which work with TfwClient on server responses?

This place looks dangerous, but I cannot find possible scenarios for race conditions with client changing here in current patch implementation. However, there are possible problems inside tfw_client_obtain() (race with tfw_client_put) mentioned in comment.

TFW_HTTP_HDR_X_FORWARDED_FOR);
__TFW_HTTP_PARSE_SPECHDR_VAL(Req_HdrX_Forwarded_ForV, Req_I_XFF,
msg, __req_parse_x_forwarded_for,
TFW_HTTP_HDR_X_FORWARDED_FOR, 0);

/* 'User-Agent:*OWS' is read, process field-value. */
TFW_HTTP_PARSE_SPECHDR_VAL(Req_HdrUser_AgentV, Req_I_UserAgent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole HTTP parser change must have a unit test in t/unit/test_http_parser.c, probably a new one, that we get the right IP address, which we pass to tfw_client_obtain(), from the header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still no unit tests neither for the new TDB routines nor for the parser changes.

@@ -3223,27 +3223,27 @@ __req_parse_x_forwarded_for(TfwHttpMsg *hm, unsigned char *data, size_t len)
__FSM_STATE(Req_I_XFF) {
/* Eat OWS before the node ID. */
if (unlikely(IS_WS(c)))
__FSM_I_MOVE(Req_I_XFF);
__FSM_I_MOVE_fixup(Req_I_XFF, 1, 0);
/*
* Eat IP address or host name.
*
* TODO: parse/validate IP addresses and textual IDs.
* Currently we just validate separate characters, but the
* whole value may be invalid (e.g. "---[_..[[").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment looks scary in context of #1115: we definitely do not want to identify client by garbage like ---[_..[[. I'm wondering why don't we have the string in suspicious_x_forwarded_for test in t/unit/test_http_parser.c. So please add the string to the test. Next, we need a verification of IP address if we use it (previously we just forwarded it to a backend w/o any processing), i.e. need to implement the TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, probably we can leave the parser code as is for now. But 2 things are required:

  1. a test, probably functional, that we do really block a client with malformed X-Forwarded-For

  2. block the client if we can not translate its X-Forwarded-For value to an IP address instead of just silently ignore the input


spin_unlock(cli->hb_lock);
ent = (TfwClientEntry *)cli;
ent->expires = tfw_current_timestamp() + client_cfg.lifetime;

TFW_DBG("free client: cli=%p\n", cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading debug message. The expiration time is set, but the entry is not removed here.

goto found;
if (user_agent)
key += hash_calc((const char *)user_agent->data,
min(user_agent->len, 256UL));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment by spaces is not used in Linux kernel, instead mixed tabs and spaces approach is used. Same applies to many lines in the patchset.

&cli->addr.sin6_addr))
goto found;
if (user_agent)
key += hash_calc((const char *)user_agent->data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, summing hashes is not a good option, as i know, it tend to create more close results, so some buckets may be used more than others. Usually the XOR operation (^=) is used to "sum" two hash values, it gives pretty sparse results.

Secondly, hash_calc((const char *)user_agent->data, min(user_agent->len, 256UL)) will return unpredictable results if user_agent is not a plain TfwStr string. Use tfw_hash_str() function, it's created specially for this case.


if (!(cli = kmem_cache_alloc(cli_cache, GFP_ATOMIC | __GFP_ZERO))) {
spin_unlock(&hb->lock);
len = sizeof(*cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TfwClientEntry structure is stored in the TDB, not TfwClient. May be need to set the right length:

Suggested change
len = sizeof(*cli);
len = sizeof(*ent);

spin_unlock(&hb->lock);
len = sizeof(*cli);
rec = tdb_rec_get_alloc(client_db, key, &len, &addr_eq, &is_new, &addr);
if (!rec)
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means a TDB issue (lack of the space?). Better to warn user about that, please add log or warning message.

TfwClientEntry *ent = (TfwClientEntry *)rec->data;
TfwClient *cli = &ent->cli;

return !memcmp_fast(&cli->addr, data, sizeof(cli->addr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is used to find the TfwClient structure matching to the client's address. But all fields of TfwAddr are matched, including port. This will allocate a new TfwClient structure for connection with the same ip, but different source port. It's incorrect and insecure behaviour. Bots will just open a new connection, but we will treat it as a new client.

Previously, only IP address was used to match TfwClient structure for the client.


spin_unlock(cli->hb_lock);
ent = (TfwClientEntry *)cli;
ent->expires = tfw_current_timestamp() + client_cfg.lifetime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to hold a tdb lock here. ent->expires may be modified in both tfw_client_put() and tfw_client_obtain() simultaneously.

#
# Default:
# client_lifetime 0;
#
Copy link
Contributor

@vankoven vankoven Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any dependencies between this parameter and other *lifetime parameters. The directive say Life time of client accounting data after last client connection was closed. It's closer to garbage collecting timeout: how many time to wait before unused TfwClient entry will be destroyed.

But dependencies between session_lifetime and frang limits truly exist.

Probably it makes sense to generalize session_lifetime to client records lifetime and our sessions?

Sounds good. There was a discussion one day, that we would like to block certain clients, not proxies that they use. Probably frang accounting data should be also stored in TfwHttpSess?

Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes are desired. See comments for details.

tempesta_db/core/htrie.c Outdated Show resolved Hide resolved
} else {
if (o)
tdb_node_visit(dbh, TDB_PTR(dbh, TDB_II2O(o)),
fn);
Copy link
Contributor

@i-rinat i-rinat Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also nice to have a comment about recursion depth being hard-limited. Fixed.

tempesta_db/core/htrie.c Outdated Show resolved Hide resolved
}

int
tdb_walk(TdbHdr *dbh, int (*fn)(void *))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name "_walk" is fine, but there is already "_foreach" used, in tdb_tbl_foreach. I propose to rename this to tdb_htrie_foreach(), for the sake of naming consistency.

@@ -931,3 +931,90 @@ tdb_htrie_exit(TdbHdr *dbh)
{
free_percpu(dbh->pcpu);
}

static int
tdb_bucket_walk(TdbHdr *dbh, TdbBucket *b, int (*fn)(void *))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code hidden behind macroses make me sad. But that's a personal preference. Aside from that, there are potential performance implications of such unification. Instead of direct calls, they become indirect through a function pointer.

I haven't tested performance impact of indirect vs. direct calls, so that's not clear. But here are some results from others: https://gist.github.com/rianhunter/0be8dc116b120ad5fdd4. Be sure to read the comments too, they are important.

}

*is_new = true;
r = tdb_entry_alloc(db, key, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that tdb_rec_get() returns with a lock to the bucket held. But couldn't find in the code if tdb_entry_alloc() holds a lock upon return. I believe it is not. So if we return from the function if (*predicate)() produces true, lock is held. But if we create a new record, it is not.

@@ -128,7 +128,6 @@ tfw_cleanup(struct list_head *mod_list)

if (!tfw_runstate_is_reconfig()) {
tfw_sg_wait_release();
tfw_cli_wait_release();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant curly braces here.

TFW_HTTP_HDR_USER_AGENT, &s_user_agent);

if (!TFW_STR_EMPTY(&s_ip) || !TFW_STR_EMPTY(&s_user_agent)) {
TfwClient *cli = tfw_client_obtain(addr, &s_user_agent, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We change TfwClient for TfwConn->peer here, but the real accounting and frang filtration is made via TfwConn->sk->sk_security (set in frang_conn_new() during new sock creation), and it seems that sk->sk_security is not changed anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I found it myself

if (cli && conn_cli != cli) {
tfw_connection_unlink_from_peer(conn);
tfw_client_put(conn_cli);
tfw_connection_link_peer(conn, (TfwPeer *)cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ikoveshnikov: in case of pipelined requests the client instance in frang_resp_handler() for particular resp->req - can be set by another request (and belong to another real client), so accounting will be broken.

if (cli && conn_cli != cli) {
tfw_connection_unlink_from_peer(conn);
tfw_client_put(conn_cli);
tfw_connection_link_peer(conn, (TfwPeer *)cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, what if some other parallel softirq is processing the client and connection, e.g.on processing server response? Do we have such places which work with TfwClient on server responses?

This place looks dangerous, but I cannot find possible scenarios for race conditions with client changing here in current patch implementation. However, there are possible problems inside tfw_client_obtain() (race with tfw_client_put) mentioned in comment.

sizeof(cli_addr->sin6_addr));
if (user_agent) {
char buf[256];
size_t len = tfw_str_to_cstr(user_agent, buf, sizeof(buf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You spin over TfwStr chunks twice: first, copying string to buffer, second time - during hash calculation. This is not very good. Copy operation should be avoided. See

tfw_hash_str(const TfwStr *str)
I've wrote about this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User-Agent can be very large. It will slowly work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't addressed. It makes sense to limit how many bytes we use for hashing, but it'd appreciated if you extend tfw_hash_str() with a length of data to process from str. The copying isn't nice at all.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some work is still required.

#
# Default:
# client_lifetime 0;
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 things are still for TODO:

  1. client_cfg.lifetime must be set from frang_limits (porbably by a callback) as the maximum timeout from the limits + HZ. Correspondingly there shouldn't be client_lifetime configuration option.

  2. Please write a new chapter "Client resource accounting" in https://github.com/tempesta-tech/tempesta/wiki/Handling-clients saying that all client accounting information is stored in a persistent Tempesta DB table with limited lifetime and describe the client_db configuration option.

sizeof(cli_addr->sin6_addr));
if (user_agent) {
char buf[256];
size_t len = tfw_str_to_cstr(user_agent, buf, sizeof(buf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't addressed. It makes sense to limit how many bytes we use for hashing, but it'd appreciated if you extend tfw_hash_str() with a length of data to process from str. The copying isn't nice at all.

TFW_DBG2("put client %p, conn_users=%d\n",
cli, atomic_read(&cli->conn_users));

if (!atomic_dec_and_test(&cli->conn_users))
return;

spin_lock(cli->hb_lock);
rec = (TdbRec *)((void *)cli - offsetof(TdbRec, data));
tdb_rec_get_lock(rec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw_client_put() releases the bucket read lock acquired in tfw_client_obtain(), so this is a nested and wrong lock.

Copy link
Contributor

@krizhanovsky krizhanovsky Feb 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're in big trouble actually with the solution and whole #1115: TDB is simply wasn't designed to keep directly referenced data. The current case for filter or cache tables is to lookup an entry, do some quick operations with it and forget about it. Clients handling is very different: we keep a pointer to stored TDB entry in many places as TfwClient *peer, i.e. once created, a client must have the same memory address.

Currenty TDB uses the read lock to protect a referenced entry from being reallocated on HTrie index split (node bursting), so while we use TfwClient, i.e. there is a peer pointer, the bucket can not be split. Having that we start from only 16 items in root node, we have plenty of splits first time and deadlocks will occur.

A possible solution could be dereference HTrie index each time on peer dereferencing, but it's too expensive. TDB must be extended to support small records with constant memory address. Actually, large records are always placed in separate buckets and only small records are compacted into one bucket. Let's leave making TDB nice for #515 and now do a very simple thing: make TfwClient data structure large enough to not to be processed in small records TDB work flow. Please don't forget to write TODO #515 as a comment for the TfwClientEntry data structure enhancement.

With the @ikoveshnikov comment about locking to protect expires - just add a spinlock to TfwClientEntry, likely with some unused data to make it big enough, to protect against concurrent expires updates.

* @return pointer to record with acquired bucket lock if the record is
* found and create TDB entry without acquired locks otherwise.
*
* TODO #515 rework the function is lock-free way
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "in lock-free" and use a dot at the end of sentences. Please be more accurate in comments.

TFW_HTTP_HDR_X_FORWARDED_FOR);
__TFW_HTTP_PARSE_SPECHDR_VAL(Req_HdrX_Forwarded_ForV, Req_I_XFF,
msg, __req_parse_x_forwarded_for,
TFW_HTTP_HDR_X_FORWARDED_FOR, 0);

/* 'User-Agent:*OWS' is read, process field-value. */
TFW_HTTP_PARSE_SPECHDR_VAL(Req_HdrUser_AgentV, Req_I_UserAgent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still no unit tests neither for the new TDB routines nor for the parser changes.

if (cli && cli != conn_cli)
req->peer = cli;
else
tfw_client_put(cli);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be a crash on cli == NULL

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to adjust the locks and it'd be good enough for merge

TFW_DBG2("client %p, conn_users=%d\n", cli, conn_users);

if (!is_new)
tdb_rec_put(rec);

spin_unlock(&hb->lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we put the record? If there is 2 connections in the system referring to the same client, then they both must keep read locks for the TDB bucket containing the client record.

What's the paired lock acquisition for spin_unlock(&hb->lock);?

Copy link
Contributor Author

@avbelov23 avbelov23 Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record doesn't change its location in TDB, since it is more than TDB_HTRIE_MINDREC, and we need to unlock the bucket with the client as soon as possible.

#
# Default:
# client_lifetime 0;
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just add the lifetime descrption to the same Wiki page

}

*is_new = true;
r = tdb_entry_alloc(db, key, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems addressed in wrong way

HLIST_HEAD_INIT,
}
};
CliHashBucket cli_hash[CLI_HASH_SZ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is cli_hash again?

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me, only several cleanups are required.

@@ -1603,6 +1604,8 @@ tfw_cfgop_frang_rsp_code_block(TfwCfgSpec *cs, TfwCfgEntry *ce, FrangCfg *conf)
|| frang_parse_ushort(ce->vals[ce->val_n - 1], &cb->tf))
return -EINVAL;

tfw_client_set_lifetime(cb->tf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is the only one limit which influences the client lifetime. However, it's confusing to see that a time frame of a particular limit defines the client lifetime, so please write a comment here that we need a maximum time frame employed by the whole limiting logic. Limits which work after client connection terminations are in interest. Only one such limit is here, so this is why we have what we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here, better to place it to (TfwCfgSpecChild *)(tfw_vhost_specs["frang_limits"].spec_ext)->finish_hook() and in finish hooks for frang limits inside vhost definitions. But it's really the only limit that affects client lifetime. Till it so, it's ok to have the function here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to duplication of the collection timeout code, as there are also http_resp_code_block in locations.

spinlock_t lock;
} CliHashBucket;
TfwClient cli;
spinlock_t hb_lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hb means hash bucket, so just lock is enough. Also please align client_cfg members above as you did for this structure.


/**
* Called when a client socket is closed.
*/
void
tfw_client_put(TfwClient *cli)
{
TfwClientEntry *ent;

TFW_DBG2("put client %p, conn_users=%d\n",
cli, atomic_read(&cli->conn_users));

if (!atomic_dec_and_test(&cli->conn_users))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conn_users was actually connection users, which is not true any more - now we increase the counter for requests as well, so it should become just users. Next, the field is used only in client.c where we have TfwClientEntry managing TfwClient placement, synchronization and lifetime, so it should be moved to TfwClientEntry. While the users counter is used only near to the spin lock, it still makes sense to keep it atomic - it's still cheaper to atomically decrement the counter than do this under spin lock in most cases.

int conn_users;

if (!memcmp_fast(&cli->addr.sin6_addr, &addr->sin6_addr,
sizeof(cli->addr.sin6_addr))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In multi line for, if, while statements we place { on the next line. Also, since we have 80-characters line limit, we prefer to keep the statments body shorter, i.e. in this case it should be:

if (memcmp_fast(&cli->addr.sin6_addr, &addr->sin6_addr,
                             sizeof(cli->addr.sin6_addr)))
       return false;
 
spin_lock(&ent->hb_lock); 
.....

Copy link
Contributor

@aleksostapenko aleksostapenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some minor cleanups.

}

static int
tfw_http_req_cient_link(TfwConn *conn, TfwHttpReq *req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*_cient_* -> *_client_*

time_t expires;
spinlock_t lock;
atomic_t users;
} TfwClientEntry;
Copy link
Contributor

@aleksostapenko aleksostapenko Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanatory comments for structure fields are appreciated, especially - for lock field (which protect against tfw_client_obtain/tfw_client_put race condition), since we have now rather complicated synchronization case in tfw_client_obtain() -> tdb_rec_get_alloc() with three separate nested locks with different protected resources.
In this context - it would also be nice to add comment for global get_alloc_lock (in tempesta_db/core/main.c), which now provides atomic execution for get/alloc TDB operation, and protection against races during instances' constructor execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has become unresolved back again. New fields was added but was not documented.

Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall. However I'd like issue with sizeof(TfwClientEntry) fixed.

/*
* The recursion depth being hard-limited.
* The function has the deepest nesting 16 and uses only
* about 16 bytes of stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I looked into the instructions generated, and found that 72 bytes of stack were used:

00000000000000a0 <tdb_htrie_node_visit>:
      a0:       41 57                   push   %r15
      a2:       41 56                   push   %r14
      a4:       4c 8d 7e 40             lea    0x40(%rsi),%r15
      a8:       41 55                   push   %r13
      aa:       41 54                   push   %r12
      ac:       49 89 fe                mov    %rdi,%r14
      af:       55                      push   %rbp
      b0:       53                      push   %rbx
      b1:       49 89 f5                mov    %rsi,%r13
      b4:       48 83 ec 18             sub    $0x18,%rsp
      b8:       48 89 14 24             mov    %rdx,(%rsp)

Times 16 is still just over 1k, which is fine.

}

*is_new = true;
r = tdb_entry_alloc(db, key, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must check that *len is equal (or larger than) *len before. Function tdb_entry_alloc() can allocate less than *len, and then returns actual number of allocated storage in *len.

spin_unlock(&hb->lock);
}
client_db = tdb_open(client_cfg.db_path, client_cfg.db_size,
sizeof(TfwClient), numa_node_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't sizeof(TfwClientEntry) be here?

unsigned int nchunks;

/*
* If a client work through a forward proxy, then a proxy can pass it's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...client works through...", "..can pass its IP address..."

NULL);
if (cli) {
if (cli != conn_cli)
req->peer = cli;
Copy link
Contributor

@i-rinat i-rinat Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req->peer most probably was obtained by calling tfw_client_obtain(). Since this assignment loses req->peer, shouldn't it release it by calling tfw_client_put()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused with conn->peer which is set for every connection. Unlike that, req->peer is only set here, so there is no need to release previous value.

@avbelov23 avbelov23 force-pushed the ab-client_tdb branch 2 times, most recently from fd3215c to fd1d9db Compare February 25, 2019 14:08

if (!memcmp_fast(&cli->addr.sin6_addr, &addr->sin6_addr,
sizeof(cli->addr.sin6_addr)))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues here. I believe the intention was "if the ip address doesn't match, proceed to the next entry". But it has quite opposite meaning since memcmp_fast() returns 0 if both buffers are equal. The issue was introduced in one of the recent rebases.

The other issue is comparison function. The TfwClient is to be obtained by three arguments: source ip address, xff ip address and user agent name. It's rather likely, that TfwClient entries for the same ip address will be placed in the same tdb bucket. E.g. in case when sources ips are equal, xff ips are equal and first 256 bytes of user agent header are equal. In that case all the clients will be mapped to the same TfwClient entry.

key ^= hash_calc((const char *)&cli_addr->sin6_addr,
sizeof(cli_addr->sin6_addr));
if (user_agent)
key ^= tfw_hash_str_len(user_agent, 256);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question. The exact place in tdb where the entry will be stored is controlled by information in the request. Attacker can modify request (single User-Agent header) to create any possible collision. So attaker can create very long collision chains for some good clients and decrease overall performance for legitimate clients.

If I am right and this behaviour is considered to be dangerous, then add TODO comment to replace ^ cli_addr ^ user_agent by secondary tdb index. #733 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing only the User-Agent will change the hash and will create entries in different baskets. Another thing is that a new attack vector is possible through the clogging of the client base.

@@ -41,7 +41,7 @@ tfw_connection_init(TfwConn *conn)
void
tfw_connection_link_peer(TfwConn *conn, TfwPeer *peer)
{
BUG_ON(conn->peer || !list_empty(&conn->list));
BUG_ON(!list_empty(&conn->list));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted

@@ -367,6 +368,7 @@ typedef struct {
* @vhost - virtual host for the request;
* @location - URI location;
* @sess - HTTP session descriptor, required for scheduling;
* @peer - end-to-end peer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to highlight that the TfwConnection->peer is hop-by-hop peer. Also, please, add that the peer is not set if hop-by-hop peer and end-to-end peer are the same.

spin_lock(&ent->lock);

if (curr_time > ent->expires) {
bzero_fast(&cli->class_prvt, sizeof(cli->class_prvt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frang accounting data is reinitialized for expired entries. But it never initialised for new entries. I also noticed in one of the very first revisions, that zeroing opaque data can be dangerous. And it can be so in some installations. When frang obtains client, it calls __frang_init_acc() that inits spinlock inside cli->class_prvt. In this patchset the spinlock can be overwritten by zeros. Normally it's not dangerous, except CONFIG_DEBUG_LOCK_ALLOC is set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the problem more and it seems it doesn't really exist. Client entry wont be expired untill connection is open and no overwrite will happen.

@@ -1603,6 +1604,8 @@ tfw_cfgop_frang_rsp_code_block(TfwCfgSpec *cs, TfwCfgEntry *ce, FrangCfg *conf)
|| frang_parse_ushort(ce->vals[ce->val_n - 1], &cb->tf))
return -EINVAL;

tfw_client_set_lifetime(cb->tf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here, better to place it to (TfwCfgSpecChild *)(tfw_vhost_specs["frang_limits"].spec_ext)->finish_hook() and in finish hooks for frang limits inside vhost definitions. But it's really the only limit that affects client lifetime. Till it so, it's ok to have the function here.

tempesta_fw/main.c Show resolved Hide resolved
@avbelov23 avbelov23 force-pushed the ab-client_tdb branch 2 times, most recently from a2f8c3c to 002eb1e Compare February 26, 2019 16:28
Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! A little clean is required before merging.

time_t expires;
spinlock_t lock;
atomic_t users;
} TfwClientEntry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has become unresolved back again. New fields was added but was not documented.

spin_lock(&hb->lock);
if ((!ctx->cli_addr && memcmp_fast(&ent->cli_addr.sin6_addr, &any_addr,
sizeof(ent->cli_addr.sin6_addr))) ||
(ctx->cli_addr && memcmp_fast(&ent->cli_addr.sin6_addr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already if (cli_addr) and if (user_agent) conditions in tfw_client_obtain(). And you can init ctx->cli_addr as any_addr and ctx->user_agent as empty string there to make conditions in this function simpler. But it's just a matter of taste.

spin_lock(&ent->lock);

if (curr_time > ent->expires) {
bzero_fast(&cli->class_prvt, sizeof(cli->class_prvt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the problem more and it seems it doesn't really exist. Client entry wont be expired untill connection is open and no overwrite will happen.

@@ -367,6 +368,9 @@ typedef struct {
* @vhost - virtual host for the request;
* @location - URI location;
* @sess - HTTP session descriptor, required for scheduling;
* @peer - end-to-end peer. That the peer is not set if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is extra here.

};
typedef struct {
TfwClient cli;
TfwAddr cli_addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli_addr can be misleading, since cli->addr exists. The field describes X-Forwarded-For address, should we name it xff_addr instead?

static struct {
unsigned int db_size;
const char *db_path;
unsigned int lifetime;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I will also rename life to expires and tfw_client_set_lifetime to tfw_client_set_expires?

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants